packages <- c("raster", "maps", "mapview", "tidyverse", "viridis", "sp", "rasterVis")
#lapply(packages, install.packages)
lapply(packages, library, character.only = T)
First, let’s grab some data that we can explore visually!
DATA <- read.csv("Ticks.csv", header = T)
TICKS!!! This is part of a dataset looking at the association among fire ants, ticks, small mammals, and tick-borne diseases (Decreased small mammal and on-host tick abundance in association with invasive red imported fire ants (Solenopsis invicta), Castellanos et al. 2016).
It is generally always good practice to take a look at your data beforehand to see what you are dealing with (check dimensions, see what the first few rows look like, and check out a summary of the data and what class each column is).
dim(DATA)
## [1] 286 21
head(DATA)
## Date Year Season PlotID LabelID Transect Treatment TrapID Weight
## 1 10/2013 2013 Fall GRR GRR 13102001 T2 Treated A3 26.0
## 2 10/2013 2013 Fall GRR GRR 13102002 T2 Treated C13 11.0
## 3 11/2013 2013 Fall GRR GRR 13111601 T2 Treated C2 10.0
## 4 11/2013 2013 Fall GRR GRR 13111602 T2 Treated C11 196.0
## 5 11/2013 2013 Fall GRR GRR 13111603 T2 Treated C20 9.0
## 6 11/2013 2013 Fall GRR GRR 13111604 UT1 Untreated C19 31.5
## ScientificName Sex Recapture NDRecapture DateLastCaught
## 1 Chaetodipus hispidus M N N
## 2 Reithrodontomys fulvescens M N N
## 3 Baiomys taylori F N N
## 4 Sigmodon hispidus F N N
## 5 Reithrodontomys fulvescens M N N
## 6 Chaetodipus hispidus M N N
## EarTagNumber LREarTag TickPresence TotalTicks TickSpecies
## 1 172 L 0 0
## 2 173 L 1 5 Amblyomma maculatum
## 3 NA 0 0
## 4 31 R 0 0
## 5 32 L 0 0
## 6 33 L 0 0
## Life.Stage Mortality
## 1 N
## 2 LL N
## 3 N
## 4 N
## 5 N
## 6 N
summary(DATA)
## Date Year Season PlotID LabelID
## 2/2014 :72 Min. :2013 Fall : 12 GRR:286 : 15
## 4/2014 :43 1st Qu.:2014 Spring: 99 GR 14040501: 1
## 7/2014 :40 Median :2014 Summer: 65 GR 14040502: 1
## 12/2013:38 Mean :2014 Winter:110 GR 14040503: 1
## 3/2014 :35 3rd Qu.:2014 GR 14040504: 1
## 6/2014 :25 Max. :2014 GR 14040505: 1
## (Other):33 (Other) :266
## Transect Treatment TrapID Weight
## T1 : 90 Treated :206 B2 : 10 Min. : 2.00
## T2 :116 Untreated: 80 C15 : 9 1st Qu.: 7.00
## UT1: 53 A2 : 8 Median : 8.75
## UT2: 27 B9 : 8 Mean : 14.75
## C11 : 8 3rd Qu.: 12.00
## C12 : 8 Max. :197.50
## (Other):235 NA's :18
## ScientificName Sex Recapture NDRecapture
## Baiomys taylori :163 : 21 N :248 N:273
## Chaetodipus hispidus : 16 F :125 Y : 25 Y: 13
## Cryptotis parva : 9 M :138 Y?: 13
## Perognathus merriami : 1 NA's: 2
## Peromyscus leucopus : 11
## Reithrodontomys fulvescens: 75
## Sigmodon hispidus : 11
## DateLastCaught EarTagNumber LREarTag TickPresence
## :263 Min. : 1.00 :162 Min. :0.00000
## 2/9/14 : 8 1st Qu.: 43.75 L : 55 1st Qu.:0.00000
## 6/4/14 : 3 Median : 62.50 R : 65 Median :0.00000
## 11/16/13: 2 Mean : 69.28 NA's: 4 Mean :0.05594
## 12/5/13 : 2 3rd Qu.: 94.50 3rd Qu.:0.00000
## 2/10/14 : 2 Max. :173.00 Max. :1.00000
## (Other) : 6 NA's :166
## TotalTicks TickSpecies Life.Stage Mortality
## Min. :0.0000 :270 :270 N:278
## 1st Qu.:0.0000 Amblyomma maculatum: 16 LL: 12 Y: 8
## Median :0.0000 NN: 4
## Mean :0.1259
## 3rd Qu.:0.0000
## Max. :9.0000
##
Each row is a mammal that was processed for ticks.
ggplot2 has two general methods of plotting: qplot and ggplot
plot(DATA$ScientificName, DATA$Weight)
This is using the base::plot function.
qplot(data = DATA, x = ScientificName, y = Weight, geom = "boxplot")
ggplot2::qplot is ggplot2’s version of a quick plot, hence the name, coding denizens are lazy (see: every package that begins with “r”). You specify the data.frame (data = ), the x axis (x = ), the y axis (y = ), and the type of plot you want (geom = ). A quick note. ggplot2 only really likes data.frames. Don’t give the data argument a matrix and cry when it spits out a totally non nebulous error.
Now, I’d like to introduce you to my good frenemy, ggplot.
ggplot(data = DATA, aes(x = ScientificName, y = Weight)) + geom_boxplot()
The plot that results is the same as from qplot, but the way we got there is different (unless you haven’t noticed the ggplot2::aes function in the room). ggplot2::aes tells the plot the aesthetics (x and y axes, color, fill, groups, size, shape, etc.) that are used in other layers. After that, you add the plot in layers called by a geom_XXXX function.
ggplot(data = DATA, aes(x = Treatment, y = Weight, fill = ScientificName)) + geom_boxplot()
By just adding a fill aesthetic, we separated out each species’ weight by scientific name.
What else can you do? What mammals did we find?
BAR <- ggplot(data = DATA, aes(x = Transect, fill = ScientificName)) + geom_bar(col = "black")
BAR
Here, I just wanted to see what mammal species (fill = ScientificName) were found in each transect (x = Transect) and how many. I’m sure y’all are absolutely shocked that geom_bar brings up a bar plot.
“But Adrian,” you say. “Aren’t species names in italics?!” Correct, and ggplot2 fun shouldn’t come at the expense of taxonomic correctness.
BAR + theme(legend.text = element_text(face = "italic"))
Most of the minor changes to a plot, its axes, the legend, etc. are confined to the ggplot2::theme function. Go look at theme reference and weep internally at the all of the arguments listed for the function. Here, we are concerned with just the legend text (or legend.text, if you will). All adjustments to text items are done using the ggplot2::element_text function (P.S. if seeing text makes you squeamish, substitute element_text with element_blank()).
Alright, I’m going to do something horrible and unleash piping and dplyr.
DATA %>% group_by(Treatment, Date) %>%
summarize(n = n()) %>%
ggplot(aes(x = Treatment, y = n)) + geom_boxplot()
So the pipe symbol (%>%) takes whatever the result is from the left and places it as the first argument in the function on the right. You can take a look at it bit by bit (after each %>%). Overall, it makes the code a lot more readable.
Otherwise summarize(group_by(DATA, Treatment, Date), n = n()) is the other alternative just to get before the ggplot function.
Need another plot example? This time we’ll look at a batch of jittered points.
ggplot(data = DATA, aes(x = ScientificName, y = Weight, fill = Treatment)) + geom_jitter(height = 0, width = 0.25)
Oh no! What happened?! Well, clearly what happened is that you are using the wrong symbol. Obviously. If you don’t want this to happen, you can also change “fill” to “color.” There’s a bunch of handy images on the ever magical Google image search for “pch r” that show what part of symbol is controlled by color and what part is controlled by fill.
ggplot(data = DATA, aes(x = ScientificName, y = Weight, fill = Treatment)) + geom_jitter(height = 0, width = 0.25, pch = 21)
You’ll notice that I added a few things to the geom_jitter function. I specified no jitter to the height of a point, capped the jitter on the x axis, and changed the symbol of the points using pch.
But now I want to do something slightly more complicated by giving a line showing the mean weight of each group (species in treatment type). To do this, we will use the ggplot2::stat_summary function.
ggplot(data = DATA, aes(x = ScientificName, y = Weight, fill = Treatment)) + geom_jitter(height = 0, width = 0.25, pch = 21) + stat_summary(fun.y = mean, geom = "errorbar", aes(ymax = ..y.., ymin = ..y.., col = Treatment), width = 0.7, size = 0.75)
This function will summarize our data and plot it out depending on what we specify in the geom argument (can be a point, errorbar, crossbar, etc.). We want the mean of the y axis (hence the use of the fun.y argument) and only want a simple line, not an entire errorbar. To do this, we set the ymin and ymax to the same value as y (the ..y.. simply refers to the calculated value of y by the function).
This is the first clear evidence you have seen so far that ggplot2 works as a series of layers (don’t beleive me? put stat_summary in front of geom_jitter).
You can specify your own color palette in ggplot2::scale_fill_manual by giving the values argument a vector of colors.
COLS <- c("#009E73", "#CC79A7", "#D55E00", "#0072B2", "#F0E442", "#E69F00","#56B4E9") #colorblind friendly palette from http://www.cookbook-r.com/Graphs/Colors_(ggplot2)/
BAR + scale_fill_manual(name = "Species", values = COLS)
The name argument also allows you to change the title of the legend.
The ggplot2::labs function allows you to change the x and y axis titles and the main title and the use of \n in a string tells ggplot2 to make a hard return.
BAR + labs(y = "No. Individuals Caught", title = "Goliad\nMammal Captures")
BAR + scale_x_discrete(limits = c("T1", "UT1", "T2", "UT2"), labels = c("Treated 1", "Untreated 1", "Treated 2", "Untreated 2")) + scale_y_continuous(breaks = seq(0, 200, 15))
The scale functions are useful for modifying the scale however you want. They follow the pattern of scale_“axis in question”_“type of data.” Our x axis has discrete data (factors relating to each transect) while our y axis is continuous (number of captures). You can use the limits argument to reorder things and the labels argument to rename them.
You can also specify coordinates (x, y) to tell the legend where to be.
BAR + theme(legend.position = "bottom")
BAR + theme(legend.position = c(0.8, 0.75), legend.background = element_blank())
But what if you don’t like that traditional gray panel background (there are OPINIONS about it)? You can change it by giving it the ggplot2::element_rect function and telling it to change the fill (or by giving it element_blank()). You can also change the color of the grid lines by specifying a color argument in element_line.
BAR + theme(panel.background = element_rect(fill = "white"), panel.grid = element_line(color = "grey70"))
Do you hate Adobe Illustrator or similar programs for making figures because you forget everything each time you make a figure? Let’s see how to combine plots because hopefully things havent gone off the rails by now.
library(cowplot)
BAR
See what I mean by people having OPINIONS on how everything should look
BAR <- BAR + scale_y_continuous(expand = c(0, 0)) #this has been bothering me while writing all of this, so I'm finally fixing it (got rid of the space between the x axis and 0 by using the expand argument in a scale function)
SPEC <- c("B. taylori", "C. hispidus", "C. parva", "P. merriami", "P. leucopus", "R. fulvescens", "S. hispidus")
PLT <- ggplot(data = DATA, aes(x = ScientificName, y = Weight, fill = Treatment)) + geom_jitter(height = 0, width = 0.25, pch = 21) + stat_summary(fun.y = mean, geom = "errorbar", aes(ymax = ..y.., ymin = ..y.., col = Treatment), width = 0.7, size = 0.75) + scale_x_discrete(labels = SPEC) + theme(axis.text.x = element_text(size = 10, angle = 75, vjust = 0.5))
PLT
plot_grid(BAR, PLT, labels = c("A", "B"), align = "h")
The cowplot::plot_grid function will put your multiple plot objects in the same plot and can label them with the labels argument.
ggsave("HorribleAwfulPlot.pdf")
By default, ggplot2::ggsave saves the most recent plot you brought up (you can also give it a plot object in the plot argument) at the dimensions that it occurs. You can adjust dpi, width, and height in here as well if need be.
Check ?ggsave
No, for real. The help pages in R are invaluable. Most questions that people ask me are either solvable by taking the time to read the function help page or googling an error code (or things being the wrong class).
We will be introducing three types of data today, how they differ, and how to manipulate and visualize them.
BRTC <- read.csv("BRTCMammals.csv", header = T)
dim(BRTC)
## [1] 62415 15
summary(BRTC)
## name orderKey familyKey
## Peromyscus maniculatus: 3282 Min. : 731 Min. : 5297
## Tadarida brasiliensis : 3115 1st Qu.: 734 1st Qu.: 9366
## Peromyscus leucopus : 2130 Median :1459 Median : 9437
## Artibeus jamaicensis : 2019 Mean :1109 Mean :1155693
## Sigmodon hispidus : 1845 3rd Qu.:1459 3rd Qu.:3240723
## Peromyscus levipes : 1541 Max. :1494 Max. :7456382
## (Other) :48483
## stateProvince year country recordNumber
## Texas :18054 Min. :1929 United States:28364 1 : 319
## Chiapas : 2866 1st Qu.:1984 Mexico :17733 2 : 304
## Tamaulipas: 1720 Median :1991 Honduras : 5417 4 : 284
## Veracruz : 1540 Mean :1991 Canada : 1680 3 : 278
## Guerrero : 1409 3rd Qu.:1995 Guatemala : 1628 5 : 272
## (Other) :35661 Max. :2015 (Other) : 7515 (Other):58570
## NA's : 1165 NA's :58150 NA's : 78 NA's : 2388
## county locality
## Brewster : 1886 No specific locality : 661
## Brazos : 1193 7.4 mi SSW San Lorenzo : 365
## Hill : 682 Lancetilla, 40 m : 305
## Tyler : 587 10.8 mi S, 2.6 mi W Jicaro Galan: 301
## Culberson: 584 9.5 mi NW Melaque : 299
## (Other) :22855 (Other) :60089
## NA's :34628 NA's : 395
## preparations catalogNumber institutionCode
## skin | skull :35050 1 : 1 TCWC:62415
## alcoholic : 9092 10 : 1
## skeleton : 5253 100 : 1
## skull : 5107 1000 : 1
## skin in alcohol | body skeleton: 3336 10000 : 1
## (Other) : 4554 10001 : 1
## NA's : 23 (Other):62409
## decimalLongitude decimalLatitude geodeticDatum
## Min. :-148.21 Min. :-28.42 WGS84:15475
## 1st Qu.:-104.56 1st Qu.: 29.93 NA's :46940
## Median : -98.70 Median : 30.88
## Mean :-100.13 Mean : 32.64
## 3rd Qu.: -96.30 3rd Qu.: 34.95
## Max. : 61.81 Max. : 66.40
## NA's :46940 NA's :46940
These data are records from the Biodiversity Research and Teach Collection’s mammal collection here at TAMU gathered from GBIF.
BRTC %>% filter(stateProvince == "Texas") %$%
county %>%
unique %>%
length
## [1] 244
I wanted to see how many counties in Texas are represented by specimens in the BRTC. Using a pipeline (hissing from back of the room at the sight of %>%) is easier to read than length(unique(BRTC[BRTC$stateProvince %in% "Texas", "county"])) by breaking it down into steps and allowing you to read from left to right.
You may have also noticed the %$% symbol which is used to index similar to how BRTC$county would be.
But we want it in an easy spatial representation. Additionally, the sp package really dislikes NA values (which we have a ton of unfortunately), so let’s get rid of those too.
TEX <- BRTC %>% filter(stateProvince == "Texas" & !is.na(decimalLongitude))
TEX <- SpatialPointsDataFrame(coords = TEX[, 13:14], data = TEX, proj4string = crs("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"))
The sp::SpatialPointsDataFrame requires coordinates (decimalLongitude and decimalLatitude), data to be placed in the “DataFrame” portion of the object (TEX), and a coordinate reference system (all points from GBIF are WGS84).
Spatial objects from sp have a few different pieces to them that you can query via @ (this is an S4 object rather than an S3, so the usual $ doesn’t work at these upper levels) A few options are TEX@bbox (min-max coords), TEX@proj4string (CRS nonsense), TEX@coords (coordinates for each point), TEX@data (all of the rest of the data).
head(TEX)
## name orderKey familyKey stateProvince year country
## 1 Cryptotis parva 803 5534 Texas 2013 United States
## 2 Geomys attwateri 1459 9437 Texas 2011 United States
## 3 Geomys attwateri 1459 9437 Texas 2011 United States
## 4 Geomys breviceps 1459 9437 Texas 2011 United States
## 5 Geomys breviceps 1459 9437 Texas 2011 United States
## 6 Geomys breviceps 1459 9437 Texas 2011 United States
## recordNumber county locality
## 1 1 Wharton TT Ranch
## 2 317 Burleson Caldwell, 4251 FM 2001
## 3 316 Burleson Caldwell, 4251 FM 2000
## 4 24 Brazos 7466 Raymond Stotzer Parkway
## 5 331 Brazos 7466 Hwy 60
## 6 21 Brazos TAMU Sheep Center, Bryan
## preparations catalogNumber institutionCode
## 1 alcoholic 62533 TCWC
## 2 skin in alcohol | body skeleton 61912 TCWC
## 3 skin in alcohol | body skeleton 61911 TCWC
## 4 skin in alcohol | body skeleton 60888 TCWC
## 5 skin in alcohol | body skeleton 60858 TCWC
## 6 skin in alcohol | body skeleton 60860 TCWC
## decimalLongitude decimalLatitude geodeticDatum
## 1 -95.96709 29.33492 WGS84
## 2 -96.68737 30.61135 WGS84
## 3 -96.68737 30.61135 WGS84
## 4 -96.40987 30.56515 WGS84
## 5 -96.56444 30.80250 WGS84
## 6 -96.40913 30.56267 WGS84
summary(TEX)
## Object of class SpatialPointsDataFrame
## Coordinates:
## min max
## decimalLongitude -106.6309 61.80891
## decimalLatitude 20.5588 66.39965
## Is projected: FALSE
## proj4string :
## [+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0]
## Number of points: 10101
## Data attributes:
## name orderKey familyKey
## Sigmodon hispidus :1022 Min. : 731 Min. : 5298
## Peromyscus leucopus : 844 1st Qu.:1459 1st Qu.: 9368
## Peromyscus gossypinus : 625 Median :1459 Median : 9679
## Geomys breviceps : 559 Mean :1306 Mean :1564320
## Reithrodontomys fulvescens: 418 3rd Qu.:1459 3rd Qu.:3240723
## Peromyscus maniculatus : 310 Max. :1459 Max. :3240723
## (Other) :6323
## stateProvince year country
## Texas :10101 Min. :1936 United States:10101
## Aguascalientes : 0 1st Qu.:1989 Argentina : 0
## Ahuachapán : 0 Median :2003 Australia : 0
## Al Hillah, Liwa: 0 Mean :1997 Azerbaijan : 0
## Alabama : 0 3rd Qu.:2004 Belize : 0
## Alajuela : 0 Max. :2013 Bermuda : 0
## (Other) : 0 NA's :9696 (Other) : 0
## recordNumber county
## 5 : 135 Hill : 661
## 4 : 134 Brazos : 559
## 1 : 133 Tyler : 542
## 3 : 125 Hardin : 499
## 2 : 122 Presidio: 497
## (Other):9082 Kerr : 484
## NA's : 370 (Other) :6859
## locality
## 14 mi SW Hunt, 2200 ft : 197
## 8 mi NE Candelaria : 166
## 7 mi S, 1 mi E Port Lavaca : 164
## 18 mi N, 4.5 mi E Oilton, Gates Lake Area: 132
## 6.5 mi N, 6.5 mi W Raymondville : 120
## 7 mi N, 4.5 mi W Raymondville : 104
## (Other) :9218
## preparations catalogNumber institutionCode
## skin | skull :5701 1015 : 1 TCWC:10101
## skeleton :1734 1016 : 1
## skull :1185 1017 : 1
## alcoholic : 601 1018 : 1
## skin in alcohol | body skeleton: 394 1027 : 1
## (Other) : 481 1028 : 1
## NA's : 5 (Other):10095
## decimalLongitude decimalLatitude geodeticDatum
## Min. :-106.63 Min. :20.56 WGS84:10101
## 1st Qu.: -99.57 1st Qu.:29.32
## Median : -97.61 Median :30.31
## Mean : -98.18 Mean :30.20
## 3rd Qu.: -96.20 3rd Qu.:31.02
## Max. : 61.81 Max. :66.40
##
As you can see, it acts pretty much like a regular data.frame would except for being high maintenance when you want something particular from it.
The good thing about these spatial classes is that they are easier to plot
plot(TEX)
#Well...looks like there are some points that aren't georeferenced correctly. Let's toss those out.
To get rid of points let’s introduce polygons in R.
CNTY <- raster::getData(country = "USA", level = 2) %>% .[.$NAME_1 %in% "Texas", ]
raster::getData is a great resource for easy to access administrative polygons. Here we grab county polygons for the entire United States and then whittle that down to only the Texas counties. Within the piping ecosystem, the . symbol represents the entire object resulting from the left, allowing us to index it like normal but without typing out CNTY.
CNTY
## class : SpatialPolygonsDataFrame
## features : 254
## extent : -106.6458, -93.50874, 25.83808, 36.50344 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
## variables : 15
## names : OBJECTID, ID_0, ISO, NAME_0, ID_1, NAME_1, ID_2, NAME_2, HASC_2, CCN_2, CCA_2, TYPE_2, ENGTYPE_2, NL_NAME_2, VARNAME_2
## min values : 2528, 244, USA, United States, 44, Texas, 2528, Anderson, US.TX.AA, NA, , County, County, ,
## max values : 2781, 244, USA, United States, 44, Texas, 2781, Zavala, US.TX.ZV, NA, , County, County, ,
#254 features (the number of counties in the Republic of Texas)
class(CNTY)
## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"
A SpatialPolygonsDataFrame unsurprisingly behaves very similarly to the SpatialPointsDataFrame (don’t run attributes on this unless you want to scroll around a bunch). The different slots that can be accessed by @ are data, polygons, plotOrder, bbox, and proj4string (I used the attributes function and scrolled for your sins).
But where were we?
OVER <- over(TEX, CNTY)
head(OVER, 20)
## OBJECTID ID_0 ISO NAME_0 ID_1 NAME_1 ID_2 NAME_2 HASC_2
## 1 2768 244 USA United States 44 Texas 2768 Wharton US.TX.WN
## 2 2553 244 USA United States 44 Texas 2553 Burleson US.TX.BU
## 3 2553 244 USA United States 44 Texas 2553 Burleson US.TX.BU
## 4 2548 244 USA United States 44 Texas 2548 Brazos US.TX.BZ
## 5 2725 244 USA United States 44 Texas 2725 Robertson US.TX.RO
## 6 2548 244 USA United States 44 Texas 2548 Brazos US.TX.BZ
## 7 2548 244 USA United States 44 Texas 2548 Brazos US.TX.BZ
## 8 2548 244 USA United States 44 Texas 2548 Brazos US.TX.BZ
## 9 2548 244 USA United States 44 Texas 2548 Brazos US.TX.BZ
## 10 2548 244 USA United States 44 Texas 2548 Brazos US.TX.BZ
## 11 2548 244 USA United States 44 Texas 2548 Brazos US.TX.BZ
## 12 2548 244 USA United States 44 Texas 2548 Brazos US.TX.BZ
## 13 2548 244 USA United States 44 Texas 2548 Brazos US.TX.BZ
## 14 NA NA <NA> <NA> NA <NA> NA <NA> <NA>
## 15 2563 244 USA United States 44 Texas 2563 Chambers US.TX.CH
## 16 2603 244 USA United States 44 Texas 2603 Fisher US.TX.FH
## 17 2603 244 USA United States 44 Texas 2603 Fisher US.TX.FH
## 18 2603 244 USA United States 44 Texas 2603 Fisher US.TX.FH
## 19 2603 244 USA United States 44 Texas 2603 Fisher US.TX.FH
## 20 2620 244 USA United States 44 Texas 2620 Grimes US.TX.GM
## CCN_2 CCA_2 TYPE_2 ENGTYPE_2 NL_NAME_2 VARNAME_2
## 1 NA County County
## 2 NA County County
## 3 NA County County
## 4 NA County County
## 5 NA County County
## 6 NA County County
## 7 NA County County
## 8 NA County County
## 9 NA County County
## 10 NA County County
## 11 NA County County
## 12 NA County County
## 13 NA County County
## 14 NA <NA> <NA> <NA> <NA> <NA>
## 15 NA County County
## 16 NA County County
## 17 NA County County
## 18 NA County County
## 19 NA County County
## 20 NA County County
sp::over is a wonderful function that checks for records/attributes that are overlayed in both spatial objects you give it. It is a good way of grabbing only those data that exist in the geographic space of another object (e.g., you don’t want disgusting marine mammals ruining your Texas mammal dataset) Starship Troopers voice “Would you like to know more?” there are vignettes on the sp::over function if you yearn for knowledge/are afflicted by insomnia.
Grab only those points that are actually within Texas.
TEX <- TEX[-which(is.na(OVER[, 1])), ]
But did it work?
plot(TEX, pch = 19) #+ maps::map(add = T, database = "county", "texas")
Yes! (Hopefully it works for you as well, if not ¯_(ツ)_/¯). The maps::map function can add quick boundaries, the default setting is set to “world” but it can be switched to “state” or “county”
ggplot2 again!
ggplot() + geom_point(data = as.data.frame(TEX), aes(x = decimalLongitude, y = decimalLatitude), col = "#880000")
Gasp, colored points! Gasp, geom_point creates points! You’ll notice that we changed TEX back into a data.frame (because ggplot2 is high maintenance).
Points were cool, but now let’s use the ggplot2::geom_polygon function to add the polygon data. Briefly, all SpatialPolygonDataFrame objects require an x of long, y of lat, and group of group. I gave it no fill (via NA) and a dark grey border (take note that the border doesn’t show up in the legend because it is outside of the aes function)
MAP <- ggplot() + geom_polygon(data = CNTY, aes(x = long, y = lat, group = group), fill = NA, col = "grey20") + geom_point(data = as.data.frame(TEX), aes(x = decimalLongitude, y = decimalLatitude, col = "#880000"), pch = 19, alpha = 0.5)
## Regions defined for each Polygons
MAP
That legend doesn’t say very much.
MAP + scale_color_manual(name = "Legend", values = "#880000", labels = "Occurences") + theme(legend.position = c(0.2, 0.2), legend.background = element_blank())
When you are using the color (or fill, shape, size, etc.) aesthetic, you can change the legend using the ggplot2::scale_color_manual (or switch color for what you are interested in). Name changes the title, labels changes the legend labels, and values changes the actual color.
What if we wanted to see what counties aren’t represented by georeferenced BRTC specimens? What if we wanted to plot TWO polygons?
NORE <- CNTY@data %$% NAME_2 %>% setdiff(unique(TEX$county))
I used a pipeline to grab the county name (NAME_2) from the CNTY polygons and then checks for differences between the list of counties and those in our points dataset.
NORE <- CNTY[CNTY$NAME_2 %in% NORE, ]
NORE
## class : SpatialPolygonsDataFrame
## features : 55
## extent : -103.9819, -94.65408, 28.81567, 36.50344 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
## variables : 15
## names : OBJECTID, ID_0, ISO, NAME_0, ID_1, NAME_1, ID_2, NAME_2, HASC_2, CCN_2, CCA_2, TYPE_2, ENGTYPE_2, NL_NAME_2, VARNAME_2
## min values : 2532, 244, USA, United States, 44, Texas, 2532, Archer, US.TX.AC, NA, , County, County, ,
## max values : 2758, 244, USA, United States, 44, Texas, 2758, Upton, US.TX.UT, NA, , County, County, ,
plot(NORE)
MAP + geom_polygon(data = NORE, aes(x = long, y = lat, group = group), fill = "gray50", col = "gray20")
ggplot2 acts as a series of layers, so the missing county layer goes on top. If you want it below everything, put it further upstream in the ggplot2 code.
Now we are going to move to the raster package to deal with raster data
ELEV <- raster::getData(name = "alt", country = "USA")[[1]]
We are using our good friend the raster::getData function to grab some elevation data for the US (the [[1]] is there because the result without it is a list of 4 RasterLayers and we only want what applies to the continental US).
ELEV
## class : RasterLayer
## dimensions : 3012, 6948, 20927376 (nrow, ncol, ncell)
## resolution : 0.008333333, 0.008333333 (x, y)
## extent : -124.8, -66.9, 24.4, 49.5 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +ellps=WGS84
## data source : /Users/Adrian/Desktop/USA1_msk_alt.grd
## names : USA1_msk_alt
## values : -171, 4275 (min, max)
You can see that it is a RasterLayer with a ton of cells. You can also see the resolution, extent, crs, and min/max values for the elevation.
plot(ELEV) #Yep, thats the continental US (and the Rocky Mountains!)
What can you do with rasters?
raster::extract(ELEV, TEX@coords)
As you can see, we used the raster::extract function to grab the elevation of the cell that each specimen matches to. Neat!
You can crop the raster to a certain geographic extent.
EXT <- extent(CNTY)
EXT
## class : Extent
## xmin : -106.6458
## xmax : -93.50874
## ymin : 25.83808
## ymax : 36.50344
Here, we just grabbed the geographic extent of the CNTY polygons using the raster::extent function (which basically just puts the CNTY@bbox info in a different object class). This isn’t really needed, but it can be a useful function for defining a study area.
plot(crop(ELEV, EXT))
plot(crop(ELEV, CNTY))
Both return the same result (the lack of Texas elevation) Butwhy.gif
?crop mentions that the y argument is an Extent object or something that can be used to create one. This is your other reminder that the help pages, their description of the arguments, and the examples at the bottom are EXTREMELY useful. Please read them.
You can also clip the raster to a polygon, but this can take a bit more time
Run the below line and then stretch or sit quietly, paralyzed by existential dread.
ELEV <- mask(crop(ELEV, CNTY), CNTY)
plot(ELEV)
I know y’all missed Texas, so I brought it back.
Now base plot is nice, but how do you plot rasters in ggplot2? You could use a combination of rasterVis::gplot and ggplot2::geom_tile, but that may lead to plotting issues. Using raster::rasterToPoints can be slow, but will plot everything.
ELEV <- as.data.frame(rasterToPoints(ELEV))
names(ELEV)[3] <- "value"
ggplot(ELEV) + geom_raster(aes(x = x, y = y, fill = value))
As you can see, it filled the plot in according to the value. Don’t like blue?
RAS <- ggplot(ELEV) + geom_tile(aes(x = x, y = y, fill = value)) + scale_fill_viridis(option = "D", name = "Elevation (m)", na.value = "grey95")
RAS
Well, isn’t that nice? The viridis package makes the viridis color scheme usable in R by scale_XXX_viridis. You may have noticed the option argument in scale_fill_viridis. Check out what happens if you change it to “A”, “B”, or “C”
Let’s add some polygons (and someone asked for contours, so let’s add that in too).
RAS + geom_polygon(data = CNTY, aes(x = long, y = lat, group = group), fill = NA, col = "grey20") + geom_contour(data = ELEV, aes(x = x, y = y, z = value), binwidth = 200, col = "white", alpha = 0.6)
There you go. An elevation map of Texas counties with contour lines for every 200m increase in elevation.